Bridging Corpus for Russian in comparison with Czech
نویسندگان
چکیده
In this paper, we present a syntactic approach to the annotation of bridging relations, socalled genitive bridging. We introduce the RuGenBridge corpus for Russian annotated with genitive bridging and compare it to the semantic approach that was applied in the Prague Dependency Treebank for Czech. We discuss some special aspects of bridging resolution for Russian and specifics of bridging annotation for languages where definite nominal groups are not as frequent as e.g. in Romance and Germanic languages. To verify the consistency of our method, we carry out two comparative experiments: the annotation of a small portion of our corpus with bridging relations according to both approaches and finding for all relations from the RuGenBridge their semantic interpretation that would be annotated for Czech.
منابع مشابه
Portable Language Technology: Russian via Czech
We report on morphological tagging of Russian using very limited Russian resources. We train the TnT tagger (Brants, 2000) on a modified Czech corpus to get the transition probabilities. We believe that the two languages are similar enough for the transitional information to be useful. The Russian emission symbols are obtained using a morphological analyzer that does not rely on a manually crea...
متن کاملA Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources
In this paper, we describe a resource-light system for the automatic morphological analysis and tagging of Russian. We eschew the use of extensive resources (particularly, large annotated corpora and lexicons), exploiting instead (i) pre-existing annotated corpora of Czech; (ii) an unannotated corpus of Russian. We show that our approach has benefits, and present what we believe to be one of th...
متن کاملStatistical Machine Translation Between Related and Unrelated Languages
In this paper we describe an attempt to compare how relatedness of languages can influence the performance of statistical machine translation (SMT). We apply the Moses toolkit on the Czech-English-Russian corpus UMC 0.1 in order to train two translation systems: Russian-Czech and English-Czech. The quality of the translation is evaluated on an independent test set of 1000 sentences parallel in ...
متن کاملExperiments in Cross-Language Morphological Annotation Transfer
Annotated corpora are valuable resources for NLP which are often costly to create. We introduce a method for transferring annotation from a morphologically annotated corpus of a source language to a target language. Our approach assumes only that an unannotated text corpus exists for the target language and a simple textbook which describes the basic morphological properties of that language is...
متن کاملCorpus Analysis for Lexical Database Construction: A Case of Russian and Czech Wordnets
The paper deals with corpus-based methods applied to the particular tasks of lexical database construction. Different techniques of the corpus analysis are discussed and their applicability for the tasks is assessed. Corpus management system Manatee + Bonito developed at the Faculty of Informatics, Masaryk University in Brno, Czech Republic, is presented as a tool that enables to perform all di...
متن کامل